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Abstract 

We study the asymptotic behavior of a sum of independent and identically distributed random variables 
conditioned by a sum of independent and identically distributed integer-valued random variables. We 
prove a Berry-Esseen bound in a general setting and a large deviation result when the Laplace trans¬ 
form of the underlying distribution is not defined in a neighborhood of zero. Then we present several 
combinatorial applications. In particular, we prove a large deviation result for the model of hashing 
with linear probing. 
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1 Introduction 

As pointed out by Svante Janson in his seminal work [13], in many random combinatorial problems, the in¬ 
teresting statistic is the sum of independent and identically distributed (i.i.d.) random variables conditioned 
by some exogenous integer random variable. In general, this exogenous random variable is itself a sum of 
integer-valued random variables. A general framework for this kind of problem may be formalized as follows. 
In the whole paper, N* will denote the set {1, 2,...} of positive integers, N = N* U {0}, and Z will be the 
set of all integers. Let be a sequence of integers and {Nn)ne'N* be a sequence of positive integers. 

Further, let (vj”f A"'A£N*,j=i,...,Af„ be a triangular array of pairs of random variables such that each line 
contains i.i.d. copies of a pair of random variables. Moreover, it is assumed that the elements of 

the array (A'j"A6N*,j=i,...,Af„ are integers. We are interested in the law of {Nn)~^Tn ■= 
conditioned on a specific value of Sn ■= that is to say in the conditional distribution 

£„ := C{{N„)-^T„\Sn = K). 

The motivation for considering distributions of (VT)^ that depend on n comes from the discrete nature 
of the problem that can lead to a degenerated conditional law as soon as P(S'„ = fc„) = 0. Nevertheless 
in many applications (e.g., occupancy problem or hashing ; see [13]), the distribution of the conditioning 
random variable X depends on a parameter A that can be freely chosen: for example, A € K. is the parameter 
of a Poisson distribution in the occupancy problem and A £]0, e~^] is the parameter of the Borel distribution 
for hashing. One can take advantage of this fact to overcome contexts in which P(5'„ = fc„) = 0 proceeding 
as follows. Consider a triangular array A”A6N*,j=i...iv„ such that y(")) converges weakly to 

(A, Y). Then choose a sequence of parameters A„ —>■ A such that, for any n, P(y)Ai ^ b. 
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In his work, Janson proves a general central limit theorem (with convergence of all moments) for this kind of 
conditional distribution under some reasonable assumptions and gives several applications in classical com¬ 
binatorial problems: occupancy in urns, hashing with linear probing, random forests, branching processes, 
etc. Following this work, at least two natural questions arise: 

1. is it possible to obtain a general Berry-Esseen bound for these models? 

2. is it possible to obtain a general large deviation result for these models? 

A Berry-Esseen theorem is given by Quine and Robinson [25]. In their work, the authors study the 
particular case of the occupancy problem where the random variables are Poisson distributed and 

y(’^) = Up to our knowledge, it is the only result in that direction for this kind of conditional 

distribution. In our work, we prove a general Berry-Esseen bound (Theorem 2.1) that covers all the examples 
presented by Janson [13]. 

When the distribution of does not depend on n, the Gibbs conditioning principle ([28, 4, 5]) 

states that Cn converges weakly to the degenerated distribution concentrated on a point y depending on the 
conditioning value (see [9, Corollary 2.2]). Around the Gibbs conditioning principle, general limit theorems 
yielding the asymptotic behavior of the conditioned sum are given in [27, 11, 18] and asymptotic expansions 
are proved in [10, 26]. In this paper our aim is to prove a large deviation result for when the joint 
Laplace transform of is not defined everywhere: we give an exponential equivalent for this 

conditional distribution. 

The case when the Laplace transform is defined has been treated by Gamboa, Klein and Prieur [9]. They 
prove a large (and a moderate) deviation principle under some strong assumptions. The most restricting 
assumption states that the joint Laplace transform of is finite at least in a neighborhood of 

(0, 0). Unfortunately, this assumption fails to be satisfied for the most interesting example presented in [13]: 
hashing with linear probing. In this case, the joint Laplace transform is only defined in ] — oo, a] x] — oo, 0] 
for some positive a. It is then natural to extend the work of [9] for such distributions. In [21, 22], Nagaev 
establishes large deviation results for sums of random variables which are absolutely continuous with respect 
to the Lebesgue measure and the Laplace transform of which is not defined in a neighborhood of 0. Following 
this work, we prove a large deviation result (Theorem 2.4). 

Let us point out the main differences between Theorem 2.4 of the present work and Theorem 2.1 of 
[9]. First, the proof in [9] is based on a sharp control of a Fourier-Laplace transform y(n) (f, u) := 

E (exp[ifJfof The Fourier part allows to treat the conditioning whereas the 

Laplace one allows to apply Gartner-Ellis theorem. In the present paper, the proof follows ideas borrowed 
from [21, 22]. More precisely, contrary to the case when the Laplace transform is defined, the large devia¬ 
tions of the sum of the random variables with heavy-tailed distributions is due to exceptional values taken 
by few random variables. Second, unlike the classical speeds in Nn obtained either in Cramer’s theorem or 
in Theorem 2.1 of [9], the speed in this paper is \/Nn- Third, one originality of our work is that the lower 
and upper bounds may differ (see equations (6) and (7)). When the Laplace transform is defined, the tails 
are controlled (see Cramer’s theorem or Gartner-Ellis theorem in [5]) and the sum satisfies a large deviation 
principle with the same lower and upper bounds. Here, as opposed to previous classical theorems, one may 
allow oscillations of the tails (in a controlled range) that lead to a large deviation result with two different 
bounds. Last but not least, the rate function obtained is not affected by the conditioning variable: the rate 
functions are the same in the conditional case and in the unconditional one (see Theorems 2.4 and 2.6). 
On the contrary, when the Laplace transform is defined in a neighborhood of the origin, the rate function 
strongly depends on the dependence between and F^"^. It is ?/ >->■ V’xC") ~ (where 

A is the limit of the ratio kn/Nn), the difference between the joint Eenchel-Legendre transform and the 
Eenchel-Legendre transform of the conditioning random variable X^'^\ This rate function is y i—>■ '0x('*)(y) 
when the conditioning term is ineffective, that is to say when the random variables and F*^") are 

independent. 

As pointed out by Janson in [13], hashing with linear probing was the motivating example for his work (see 
section 3 for a complete description of the model). This model comes from theoretical computer science. 
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where it modelizes the time cost to store data in the memory. Then, it was introduced in a mathematical 
framework by Knuth [16]. Due to its strong connection with parking functions, the Airy distributions (i.e., 
the area under the brownian excursion), this model was studied by many authors (see, e.g., Flajolet, Poblete 
and Viola [ 8 ], Janson [12, 14, 15], Chassaing, Janson, Louchard and Marckert [2, 1, 3], and Marckert [20]). 
Theorem 2.4 allows to treat the interesting example of hashing with linear probing: Proposition 3.3 is the 
formulation of Theorem 2.4 in this particular framework. 

The paper is organized as follows. In section 2, we present the general model and give our two main 
theorems. First we prove a Berry-Esseen bound (Theorem 2.1) and show how it straightforwardly applies to 
the examples presented by Janson [13]. Second we establish a large deviation result (Theorem 2.4). Section 
3 is devoted to the study of hashing with linear probing. Finally, we prove our main results in the last 
section. 


2 Main results 

2.1 Framework and notation 

For all n ^ 1, we consider a pair of random variables such that is integer-valued and 

real-valued. Let iV„ be a natural number such that -boo as n goes to infinity. Let 

(z = 1 , 2 ,..., Nn) be an i.i.d. sample distributed as and define 

Nr, Nr, 

and r„:=^F/"). 

i=l i=l 

Let kn such that P(5'„ = > 0 and let f/„ be a random variable distributed as r„ conditioned on 

Sn = kn- We establish a Berry-Esseen bound and a large deviation result for (C/„)„^i. 


2.2 Conditional Berry-Esseen bonnd 

Theorem 2.1. Suppose that there exist positive constants ci, ci, C2, £3, C3, C4, C5, and cq such that: 
(H2.1.1) £1 ^ axM ■■= Var < ci; 


(H2.1.2) pxM :=E | - E [V^”)] 


^ C2Cr^(„), 


(H2.1.3) define Y := F^"^—Cov(X^"\F("))/(T^(„), there exists rjo > 0 such that, for all s € [— 7 r, 7 r] 
and t € [0, Pq], 


E 




^ 1 - C5(cr|(„)S^ + (Jy,^r,)t‘^) 


(H2.1.4) kn = iV„E [X^"^] + 0{axMNn^"^) (remind that kn € Tr and P(5'„ = kn) > 0); 
(H2.1.5) £3 ^ cry(n) := Var ^ C 3 ; 


< r^rr^ 


(H2.1.6) Pym :=E |f(") - E [F^”)] 

(H2.1.7) the correlation r„ := Cov ^xm^ym satisfies jr„j ^ ce < 1, so that 


Tn ■■= („)(1 - rl) > £i(l - cl) > 0. 


Then the following conclusions hold. 
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2.1. a. There exists C 5 > 0 such that 


¥{Sn = kn) ^ 


C 5 


2'KCrxMNn‘^ 


2.1.b. For Nn ^ -/Vg := max(3, c®, c®), the conditional distribution of 

given Sn = kn satisfies the Berry-Esseen inequality 

'Un - NnE [y(")] - {kn - iV„E [X(")]) 


sup 


Nn'^Tn 


< a: I — <i>(a;) 


< 


C 


N, 


1/2 ’ 


( 1 ) 


where $ denotes the standard normal probability distribution, and C is a positive constant that only 
depends on Ci, Ci, C 2 , C3, C3, C4, C5, C5, and cq. 

2 .I.C. Moreover, there exist two positive constants C 7 and Cg only depending on ci, ci, C2, £3, C3, C4, C5, 
£5, and cg such that 


and 


E [Un] - iV„E[y(")] - rn^^ikn - 7V„E[X(")]) 

axM 


IVar {Un) - Nnvi^l ^ cgiV^^ 


< C 7 


If Nn ^ No ■= niax(iVo, 4c|/c|), we also have 

sup 


Var(C/„)^/^-V 




C 


N 


1/2 ■ 


( 2 ) 

(3) 

(4) 


where C is a constant that only depends on £1, ci, C2, £3, C3, C4, C3, £3, and cq. This result means 
that Un is asymptotically normal. 

Remark 2.2. 

1. The fact that iV„ —>■ +00 is only required for the existence of the constant C 5 which relies on Lebesgue 
dominated convergence theorem. 

2. The set of hypotheses of Theorem 2.1 implies the one of the central limit theorem stated in [13, 
Theorem 2.1] which is clearly not surprising. Notice that by assumption (H2.1.4), the conditioning is 
approximately equal to the mean as in the central limit theorem given in [13, Theorem 2.3]. 

3. As a consequence of Proposition 4.4 below, ci can be chosen as c^®/4. 

4. Assumption (H2.1.7) is not very restricting as we will see later in the examples. 

5. One should note that 2.1.a is the analogue of Equation (7) of Lemma 3.2 in [9]. 

6 . In the proof, we will replace by the projection Y in order to work with a centered variable 

which is also uncorrelated with We introduce Y for that purpose. 

7. If {X,Y') is a pair of random variables such as the correlation r satisfies |r| < 1, then 


E 


„i(sX+tY') 


= 1 — -{(j\s^ + 2ax(XY'rst + ay/t^) + o(s^ + t^) 
^ 1 _ + cry,t^) + o{s^ + f), 


so hypothesis (H2.1.3) is reasonable for i.i.d. sequences. 


4 




















As mentioned in [13], the result simplifies considerably in the special case when the pair does 

not depend on n, that is to say when we consider a single sequence instead of a triangular array. This is a 
consequence of the following more general corollary. 

Corollary 2.3. A ssume that {X,Y) as n —> oo and that, for every fixed r > 0, 


lim sup E 

n—>-+oo 



< oo 


and 


lim sup E 

n—>-+00 



< oo. 


Suppose further that the distribution of X has span 1 and that Y is not a.s. equal to an affine function 
c + dX of X, that fc„ and are integers such thaY& = kn/Nn and Nn —>• +oo. Then, all hypotheses 

of Theorem 2.1 are satisfied and Theorem 2.1 holds. 


2.3 Applications 

In this section we give several examples borrowed from [13] and [11]. A direct application of Corollary 2.3 
leads to Berry-Esseen bounds in each of them. 


2.3.1 Occupancy problem 

In the classical occupancy problem (see [13] and the references therein for more details), m balls are dis¬ 
tributed at random into N urns. The resulting numbers of balls {Zi,... ,Zf^) have a multinomial distri¬ 
bution which equals that of (Xi, • • • ,Xn) conditioned on where Xi, ..., Xn are i.i.d. with 

Xi ~ ’P(A), for any arbitrary A > 0. The classical occupancy problem studies the number W of empty urns 
that is the distribution of l{Xi=o} conditioned on A = m. 

Let us follow the work of Janson [13] and suppose that m = —>■ oo and N = ^ oo with kn/Nn —>■ A. 

Then W can be taken as Un in Theorem 2.1 with X^'^^ ~ 'P(A„) and we 

choose A„ = kn/Nn so that assumption (H2.1.4) holds. 

• If kn, Nn —t oo such that kn/Nn —>■ A G (0, oo), then Corollary 2.3 immediately yields that the 
conclusions of Theorem 2.1 hold. 

• In the case kn/Nn —t oo, assumption (H2.1.1) is clearly violated and Theorem 2.1 does not apply. 

• In the case kn/Nn —t 0, Theorem 2.1 can not be applied as stated since implies 

that assumption (H2.1.7) does not hold (r„ —?► —1). As explained in [13], one can choose instead 
F(”) := +X^'^^ — 1 = — 1)+ and it is clearly verified that Theorem 2.1 applies without 

any extra assumption. 

2.3.2 Branching processes 

Consider a Galton-Watson process, beginning with one individual, where the number of children of an 
individual is given by a random variable X having finite moments. Assume further that E(A1) = 1. We 
number the individuals as they appear. Let Xi be the number of children of the i**' individual. It is well 
known (see [13, Example 3.4] and the references therein) that the total progeny is n > 1 if and only if 

k 

Sk ■= Xi > fc for 0 < fc < n but Sn = n — 1. (5) 

i=l 

This type of conditioning is different from the one studied in the present paper, but Janson proves [13, 
Example 3.4] that if we ignore the order of Xi,..., X„, they have the same distribution conditioned on (5) 
as conditioned on Sn = n — 1. Hence our results apply to variables of the kind Yi = /(Xi). Eor example if 
Yi = l{Xi=3}, the Y/’i=i Yi is the number of families with three children. 
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2.3.3 Random forests 


Consider a uniformly distributed random labeled rooted forest with m vertices and N < m roots. Without 
loss of generality, we may assume that the vertices are 1 ,... ,m and, by symmetry, that the roots are the 
first N vertices. Following [13], this model can be realized as follows: the sizes of the N trees in the forest are 
distributed as Xi ,..., conditioned on X]i=i ~ where Xi are i.i.d. with the Borel distribution for 
some arbitrary parameter A g] 0, 1/e] (see section 3.3 for more details on Borel distribution and references 
therein). Further tree number i is drawn uniformly among the trees of size Xi. 

A classical quantity of interest is the number of trees of size K in the forest (see, e.g., [17, 23, 24]). It means 
that we choose F) = l^Xi=K}- Let us now assume that we condition on X]i=i Xi = m with m = kn ^ +oo, 
N = Nn —>■ + 00 . The framework is similar to the one of Subsection 2.3.1 and we proceed analogously. 
Assume kn/Nn —> A and take having Borel distribution with parameter A„ = kn/Nn- 


2.3.4 Bose-Einstein statistics 

This example is borrowed from [11]. Consider N urns. Put n indistinguishable balls in the urns in such a 
way that each distinguishable outcome has the same probability 


1 / 


n + A^ — 1 
n 


see for example [6]. Let Zk be the number of balls in the k^^ urn. It is well known that {Zi ,..., Zm) is 
distributed as (Xi, • • • , X^r) conditioned on where Xi, • • • , Xjy are i.i.d. and geometrically 

distributed. 


2.3.5 Hashing with linear probing 

Hashing with linear probing can be regarded as throwing n balls sequentially into m urns at random; the 
urns are arranged in a circl and labeled. A ball that lands in an occupied urn is moved to the next empty 
urn, always moving in a fixed direction. The length of the move is called the displacement of the ball, and 
we are interested in the sum dm,n of all displacements. We assume n < m and denote N = m — n. 

Janson [12] proved that the length of the blocks (counting the empty urn) and the sum of displacements 
inside each block are distributed as (Xi, Yi),..., (Xjv, Fat) conditioned on where (Xi,Yi) 

are i.i.d. copies of a pair (X, Y) of random variables, X having the Borel distribution with any parameter 
A G ]0,e“^] (see section 3.3 for more details on Borel distribution and references therein), and Y given 
X = I is distributed as As in 2.3.1, we assume that m = —>• oo and N = X„ —>• oo with 

kn/Nn —>■ a £ [1, +oo[. So, A„ := (n„/m„) exp(—n„/m„) G [O, e~^ [ and A„ —>■ (1 — 1/a) exp(—1 + 1/a) =: A. 
If X^") has Borel distribution with parameter A„, Corollary 2.3 yields the desired Berry-Esseen bound. 


2.4 Conditional large deviation result 

In [9], the authors proved a classical large deviation principle for the conditional distribution £„ which 
applies to examples 2.3.1 to 2.3.4. Their result [9, Theorem 2.1] is the analogue of the central limit theorem 
of Janson [13]. The proof relies on Gartner-Ellis theorem which requires the existence of the Laplace 
transform in a neighborhood of the origin. In the context of hashing, however, the joint Laplace transform 
is only defined on (—oo, a) x (—oo, 0) for some a > 0 and [9, Theorem 2.1] cannot be applied. Consequently 
one needs a specific result in the case when the Laplace transform is not defined. 


Theorem 2.4. Suppose that: 

(H2.4.1) log(crA:(")) =o(X„^/^) where axM 
(H2.4.2) pxM := E [jX^'^) - E [x(")] 


= Var (X(”))^^^• 
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(H2.4.3) there exists c > 0 such that, for all n ^ 1 and s G [—tt, tt], 


E 


^ 1 - 


(H2.4.4) fc„ = iV„E [X(")] + 0(axwiVy"); 

(H2.4.5) Var(y(")) = o . 

(H2.4.6) the right tail of satisfies: there exist a > 0 and /3 > 0 such that, for all y > 0, 

liminf J_ logP(y("^ ^ iV„?/) > -/3 

n^oo y'Nny 


and 


Then, for all y > 0, 


limsup sup ^=logP(y^"^ ^ m) < —a. 

n->oo u'^y/NTy 


-/3y/y ^ liminf ^=logP(r„ - E [Tn\Sn = fc„] > N^ylSn = K) 

n^oo yy/Nn 

^ limsup logP(r„ - E [Tn\Sn = kn] > Nny\Sn = kn) ^ -ay/y. 

n—^oo V lyfi 


Remark 2.5. 


(6) 

(7) 


1 . Notice the different nature of the assumptions on the standard deviations Cxm and ayin). 

2. The small shift allowed in assumption (H2.4.4) is the same as the one in assumption (H2.1.4) of 

Theorem 2.1. When the joint Laplace transform is defined in a neighborhood of the origin, one can 
use exponential changes of probability: a first one is based on the Laplace transform of and leads 
to reduce the conditioning to the mean iV„E of S'„ whereas the second relies on the Laplace 

transform of and removes the conditioning leading to the study of a pair of random variables 
(see [9]). The large deviation principle is then proved for a larger range of shifts in the conditioning. 

The result deeply relies on the following unconditioned one. 


Theorem 2.6. For all n ^ 1, let Zn be a positive number. Suppose that Nn —>■ +oo and that: 
(H2.6.1) liminf > 0; 

(H2.6.2) Var(rM) = o 

(H2.6.3) the right tail ofY^^'> satisfies: there exist a > 0 and /3 > 0 such that 

liminf ^ logP(y(”) ^ z„) ^ -P 

n-i-oo 


and 


limsup sup ^=logP(F^"^ ^ m) ^ —a- 

n^co V'^ 


Then 


—/3 ^ liminf ■ 

n—^co 

^ limsup 

n—foo 


'Z, 

1 


iogP(r„ 

:logP(r„ 


iV„E[F(”)] ^ Zn) 

- 7V„E[y(’^)] ^ Zn) ^ 


—a. 


( 8 ) 

(9) 


Remark 2.7. Assumption (H2.6.1) naturally implies that Zn goes to infinity with n. 
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3 Application to hashing with linear probing 


In this section we show that the example of hashing with linear probing briefly presented in section 2.3.5 
satisfies the hypotheses of Theorem 2.4. We begin with a precise description of the model. 

3.1 Complements on the model 

Hashing with linear probing is a classical model in theoretical computer science which has been studied 
from a mathematical point of view by several authors [ 8 , 12, 14, 1, 20]. For more details on the model, we 
refer to [ 8 , 12, 14]. The model describes the following experiment. One throws n balls sequentially into 
m urns at random; the urns are arranged in a circle and numbered. A ball that lands in an occupied urn 
is moved to the next empty urn, always moving in a fixed direction. The length of the move is called the 
displacement of the ball and we are interested in the sum of all displacements which is a random variable 
noted dm,n- We assume n < m and define N = m — n. 

In order to make things clear, let us give an example. Assume that n = 8, m = 10, and (6,9,1,9,9, 6, 2, 5) 
are the addresses where the balls land. This sequence of addresses is called a hash sequence of length m and 
size n. Let di be the displacement of ball i, then di = d 2 = ds = 0. The ball number 4 should land in the 9*^ 
urn which is occupied by the second ball; thus it moves one step ahead and lands in urn 10 so that di = 1. 
The 5*^ ball should land in the 9*** urn. Since it is not possible (the urn being occupied by the second ball), 
it moves to the 10*^ urn which is also occupied; it then moves to the first urn (also occupied) and finally 
to the second urn so that d^ = 3. And so on: de = 1, dr = 1, dg = 0- Here, the total displacement equals 
l + 3 + l + l = 6. After throwing all balls, there are N = m — n empty urns. These divide the occupied urns 
into blocks of consecutive urns. For convenience, we consider the empty urn following a block as belonging 
to this block. In our example, there are two blocks: the first one containing urns 9,10,1,2,3 (occupied), 
and urn 4 empty, and the second one containing urns 5,6,7 (occupied), and urn 8 empty. 

Janson [12] proved that the lengths of the blocks (counting the last empty urn) and the sum of displacements 
inside each block are distributed as (Ali, Yi),..., (ATjv, Fjv) conditioned on where {Xi,Yi) 

are i.i.d. copies of a pair {X,Y) of random variables, X having the Borel distribution with any parameter 
A G ] 0, e~^] (see section 3.3 for more details on Borel distribution and references therein) and the conditional 
distribution of Y given X = I being the same as the distribution of dij-i. So, dm,n is distributed as 
conditioned on ~ The following lemma presents already known results on the total displacement 

dn+i,n that will be useful in the proofs. 

Lemma 3.1. 

1. The number of hash sequences of length n + 1 and size n is (n + 1)". 

2. One clearly has 0 ^ dn+i,n ^ ■ 

3. For any y ^ 0, the function defined from N to [0,1] by n ^ F{dn+i,n ^ y) is an increasing function 
ofn. 

4 . The total displacement of any hash sequence {hi ,..., hn) is invariant with respect to any permutation 
of the h{s. More precisely for any permutation a of {1,... ,n}, the total displacement associated to 
the hash sequence {hi,... ,hn) is the same as the total displacement associated to the hash sequence 

{^a(l ); • ■ • ; ^a(n) ) ■ 

Proof of Lemma 3.1. The first three points are obvious. Let us prove the last one. It is a consequence of 
[12, Lemma 2.1]. For any hash sequence {hi,..., hn) and for any i = 0,... ,n + 1, let us define 

Zi := Cardjfe G |1, n], /ifc = *} 
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and Ei := X]fe=i (notice that Zq = 0 and Sq = 0). It is obvious that the sequence (Ei)i=o,...,n+i does not 
depend on the order of the hash sequence (hi,..., hn). Now, formula (2.1) in [12, p. 442] establishes that 

n+1 

dn-\-l,n — ^ ^ Hi fl 

where Hi^ the number of items that make attempt to be inserted in cell z, is related to the sequence 
with the following formula (see [12, Lemma 2.1]): 

Hi = T,i - i - min(Efe - k) + 1. 
k<i 

Hence dn+i.n does not depend on the order of the hash sequence (hi,..., hn)- □ 

Using the results in [8, 13, 12], we can prove that the joint Laplace transform of (X,Y) is only defined on 
(—oo,a) X (—00,0) for some positive a. Hence, Theorem 2.1 of [9] can not be applied here. 

3.2 Large deviations for hashing with linear probing 

In order to provide large deviation bounds for dm,n, we need to describe the asymptotic behavior of P(U ^ y), 
which is given in the following proposition. 

Proposition 3.2. Let X be the parameter of the Borel distribution of X be such that k '■= — log(A) — 1 ^ 
log(2). Then, 

— ^ liminf —— logP(y > y) < limsup —— logP(U > y) < —a, (10) 

y-^+oo 

with 

a := k-\/2 and 

Now, for all n ^ 1, let m„ and Un be integers such that n„ < m„, and iV„ := — n„. Suppose 

that rUn/Nn —>■ a G [l,+oo[. We introduce A„ := (un/mn) exp(—nn/mn) G [0,e“^[. Hence A„ —>■ (1 — 
1/a) exp(—1 + 1/a) =: A. To apply Proposition 3.2, suppose that A ^ (2e)“^. Let (Al|"\ y/"^)j=i^2,...,v„ be 
i.i.d. copies of (X^"), F^")), X^'^^ following Borel distribution with parameter A„ (so that E[X("^] = rrin/Nn), 
and given = I being distributed as dij-i. Let 

Nn, Nrt 

Sn:=Y.xl^^ and 

i=l i=l 

The total displacement dm„,nn is distributed as the conditional distribution of T„ given Sn = ran- Since 
assumptions (H2.4.1) to (H2.4.5) are also satisfied by ^ (i = 1,2,..., X„), we can apply Theorem 

2.4. 

Proposition 3.3 (Large deviations for hashing with linear probing). For a and (3 defined in Proposition 
3.2 and kn = mn, assumptions (H2.4.1) to (H2.4.6) are satisfied. Then, for all y > 0, 

liminf logP(dm„,„„ -E[dm„,nJ > A/„y) 

n-yco ^JNn 

^ limsup^^logP(dm„,„„ -E[dm„,„„] > X„y) < -a^. 

n—¥oo Y Hji 


y^+oo yU 


P:-2k^\[1 + -\ b+i±^ 
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3.3 Proof of Proposition 3.2 

We start computing the asymptotic tail behavior of X. Remind that X has Borel distribution with parameter 
A G ] Oj which means that 

where T is the well-known tree function (see, e.g., [8] or [12] for more details). We define k g]0, -|-oo[ by 
K := -log(A) - 1. 

Lemma 3.4. 

(i) The asymptotic behavior of X is given by 

logP(Af = n) = —Kn{l + o(l)). 


(ii) The asymptotic tail behavior of X is given by 

logP(X ^ n) = —Kn{l + o(l)). 

Proof, (i) By Stirling formula, 

1 (Ae) 


logP(X = n) = log 
(ii) Similarly, using Stirling formula, 


V^T(A) n3/2 


(1 -I- o(l)) = -Kn(l -I- o(l)). 


P(X > n) = V P(X = k) = —=}■ -V 

' ' > ' vSnA).tr. 


fe>ri 


'y g-Kfe(i+oi 


m 


Let e > 0. Then there exists no G N such that, for any k ^ no, |o(A:)| ^ e. Thus, for any n ^ no, 

g-«fe(i+6) ^ ^T{X)V{X > n) < ^ 


k'^n 

Using the fact that Ae < 1, we get 

( v^T(A) 


E' 

k^n 


,— Kk{l:Le) I _ 


= log 


k'^r. 


g—/^n g±Kn£ 


,V^r(A) 1 - 
= —Kn(l ± e)(l -I- o(l)). 


which leads to the required result when e goes to 0. 


( 11 ) 


( 12 ) 


□ 


Proof of the upper bound in (10). Let y > 0 and Uy be the ceiling of the positive solution of 2y = n(n — 1): 


22/ 


1 1 
4 + 2 


(13) 


Since Y conditionally to Af = n -I- 1 is distributed as dn+i^m "we get 

+ 00 +00 


^ y) = E P(^"+i.n > y)nx = n + 1) ^ ^ P(X = n + 1) = P(Af ^ Uy). 
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By (12) and the fact that Uy = + o(l)), we finally conclude that 

linisuplogP(y > y) ^ 

y—^+ca 

□ 

Proof of the lower bound in (10). Let y > 0. For any niy £ N* such that ruy ^ one has 

+ 00 

P(y ^ y) = ^ P(d„+i,„ > y)P(X = n + 1) 

n—Uy 

> P {dmy + l,my > y) P(^ = my + l) 


So, we are interested in the hash sequences of length rUy + 1 and size rUy that realize a total displacement 
greater than y. More precisely, we want to evaluate the probability P (dm„+i,mj, ^ y) or at least to bound 
it from below. In that view, for any 0 < k ^ consider the following hash sequence: 

(1, 1, 2, 2, ... fc, fc, fc + 1, fc + 2, ..., rUy — k). (14) 


On the one hand, it is decomposed into nriy — 2k single numbers and k pairs leading to a hash sequence 
of size rUy as required. On the other hand, each pair (y, (?) {q = 1... k) realizes a displacement equal to 
((? — 1) + g while each singleton q {q = k + 1.. .nriy — k) realizes a displacement equal to k. The total 
displacement is then k(rny — k). It remains to choose niy and 0 ^ such that kfruy — fc) ^ y in order 

to obtain the best possible lower bound. 

Moreover as mentioned in Lemma 3.1 the total displacement associated to any hash sequence does not 
depend on the order of the hash sequence. One can consider all the permutations of the hash sequence 
defined in (14) whose total number is given by 

fmy\fmy — \\ f2k + l\f2k\f2k—2\ f2\ m„! 

(ij( 1 1 j(2j( 2 

As a consequence, P(y > y) is bounded from below by +1)^;; ^^P(A^ = my + 1). By Stirling formula, 
n! ~ \/2TTn (-)” and the asymptotic behavior of X given in (11), 


log 


niy 


{rUy + 1 )™!( 2 ^ 


P(X = rUy + 1) ) ^ —(k + V)my — fclog2. 


(15) 


Now the inequality k(niy — k) ^ y admits solutions as soon as niy ^ 2yty. Hence we take rUy = 2t^ for 
some t ^ 1. Simple computation shows that the best possible choices for k and t are k = —-— ^ and 

t = ^1 + ~ values of my and k into (15) leads to the value 

which completes the proof of the minoration. □ 
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4 Proofs 


4.1 Notations and technical results 

The proofs of Theorems 2.1 and 2.4 intensively rely on the use of Fourier transforms. Define tpn and 'i/'n by 


ifn{s,t):=¥. exp - E ) + ft - E 

and '0„(t) := 27rP(S'„ = fc„)E exp |ft — XjiE F^"^ 


(16) 

(17) 


In this first section, we establish some properties of those two functions. First notice that we have ifn{s, 0) = 




AsX 


(n) 


and V'n(O) = 27rP(S'„ = fc„). 


Lemma 4.1. One has 


i>n{t) = 






-isCT-J , AT 

e x(") " L 




1/2 


,t\ds. (18) 


Proof. Since 


we have 


= 27rl{s„=fc„}, 


= 27rP(5„ = fc„)E [exp [it (t/„ - X„E } 


= 27rE 


exp [it (Pn - X„E 


y(n 


)}ls. 


— kn 


J E [exp [is {Sn - kn) + it (t„ - X„E [f^")] ) } 

J — TT 


ds 


which leads to the result after the change of variable s' = sax{n.)Nn^'^. 

Lemma 4.2. 

(i) Under assumption (H2.1.3), for any integer I ^ 0, and for |s| ^ nax^Nn^'^, |t| ^ , 

s t 


□ 




N„-l 


,U/2’ ,^1/2 

(JxMjyn CTyMNn 


< g-(F+P).C5(lV„-Z)/Ar„^ 


(19) 


1 /2 

(a) Under assumption (H2.4.3), for any integer I ^ 0, and for |s| ^ nax^Nn , 




Nrv-l 


1/2 ’ ® 


< g-s^-c(Ar„-0/Ar„^ 


( 20 ) 


-.ax{-n)Nn 

Proof. The proof is a mere consequence of the inequality 1 + a; ^ e“. □ 

In the sequel, we also need different controls on the first derivative of (/>„ with respect to the first variable. 
Lemma 4.3. For any s and t, one has: 


12 



































(^) 


dipn 


dt 






N, 


1/2 


N); 


( 21 ) 


(ii) 


dipn 


Bf \ Arl/2 ’ )\rl/2 

C't \axMNn ay^Nn , 


^^{\sK + \t\) ■ 


N, 


1/2 


st| 


PxM 
^3 


1/3 


iVn 

f PH 


■S / Pxi-^) 


2 Vcr^(„) 


Y 

WY") 


n) 


2/3 


i / Pr(") 


Proof. We apply Taylor Theorem to the function defined by 

dipn I S t 


2 V ClyCn) / J 


(s,t) !-)■ /(s,t) = 


r9t \ Arl/2 ’ Arl/2 

C't \axMNn aY(r,)Nn , 


( 22 ) 


(23) 


We conclude to (i) using 


l/(s,i) -/(0,0)| < |s| sup 

s.e'e[o,i] 


9s 


{es,0't) 


\t\ sup 
e.e'G[o,i] 


dt 


{es,0't) 


and to (ii) using 


|/(s,t) - /(0,0)| ^ |s| 


ds 


( 0 , 0 ) 


1*1 


df 


+ |st| sup 

6»,e'G[0,l] 


dt 

d^f 


dtds 


( 0 , 0 ) 

{0s, 0't) 


s 

y 


t 


sup 

6»,6('G[0.1] 

2 

sup 

e.e'efo.i] 


92 s 

9V 


dH 


{0s, 0't) 
{0s, 0't) 


□ 


Proposition 4.4. 

1. Under assumption (H2.1.2), one has ctxm ^ (dc^)”^. 

1 /2 

2. Under assumption (H2.4.2), one has crx(n)Nn —^ +oo. 

Proof. The proofs of both results rely on the fact that, for any integer-valued random variable X (see [13, 
Lemma 4.1.]), 


cri < 4E 


|X-E[X]|^ 


The conclusion follows, using hypothesis (H2.1.2) (resp. (H2.4.2)). 


□ 


Proposition 4.5. We assume hypotheses (H2.1.2), (H2.1.3), and (H2.1.4) (or (H2.4.2), (H2.4.3) and 
(H2.4.4)^. Then there exists m > 0 such that 


P(5„ 


hn) ^ 


m 

27rtTjf (n) ivY^ 


13 











































Proof. Only consider the indices n for which (JxM < +oo. Remember that (p„(s,0) = E 
and 


1pn{0) = 27rP(S'„ = kn) = 


1 


’’xC") 


jvy2 


CTjjf (n) N, 


1/2 


^x(") 


N, 


1/2 


— isVn Nn 

e 




1/2 ■ 


where Vn = -172 j by lemma 4.1. Let us prove that the sequence 

cr^(„)NP 




ds 


{Un)u = 


converges to from which the conclusion follows, since {vn)n is bounded by (H2.1.4) (or (H2.4.4)) and 
E(<S'„ = kn) > 0 for all n. Inequality (19) with I = 0 and t = 0 (or (20) with I = 0) implies that the 
sequence {un)n is bounded. Let us prove that is the only accumulation point of {un)n- Let 4>{n) such 
that (it0(„))„ converges. Even if it means extracting more, we can suppose that (//^(n))^ converges. Let 
V = limw^^n). Using Taylor Theorem, there exists f G K. such that 


Pn 


-, 0 — 1 + 




DCr^(n)Jv„ 


E 


-E 


where the last equality follows from hypothesis (H2.1.2) (or (H2.4.2)). Now, 






, 0 I —>■ e 


— isv—sj^l g——(s+z-u) /2 


1 /2 

and, by Lebesgue dominated convergence theorem and the fact that Ox^Nj —>■ +00 (see Proposition 
4.4), 

i’4,(n) (O)crjf(0(„)) ^ -)> V^. 


□ 


4.2 Proof of Theorem 2.1 

Part a) is Proposition 4.5 with C5 = m. Now we follow the procedure of Janson [13] to uncorrelate 
and and center the variable We replace by the projection 

y'(n) _ y(n) _ ]E[y(n)] _ Cov(X^("), F^")) / („) _ _ 

Then E[f'(")] = 0 and Cov(X(”), f'(")) = E[X(")f'(")] = 0. Besides, assumptions (H2.1.3) and (H2.1.7) 
are verified by F By assumption (H2.1.7), 

so (H2.1.5) is satisfied by F Finally, by Minkowski Inequality, assumptions (H2.1.2) and (H2.1.6), and 
the fact that |r„| < 1, 


y'(") 


y(") _ e[F*-"^] 

r n| ^XM^Yi^) 

X(") -E[X(")] 


3 


3 <^XM 



1/3 

^ 1/3 , PxM 

Cr^(n) 

^ (^2 + C 4 ). 
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Hence V satisfies assumption (H2.1.6). Consequently, all conditions hold for the pair too. 

Finally, 

Cov(x("),r(")) 






i=l 




X(") 




Sn - A^nE 


X(") 


)■ 


So, conditioned on S'„ = fc„, we have = Tn — A^„E — iV„E[X(")]). Hence the 

conclusions for and are the same. Thus, it suffices to prove the theorem for 

in other words, we may henceforth assume that E = E = 0. Note that in 


that case = cTyM ■ 


Proof of Theorem 2.1 - Part b). We follow the classical proof of Berry-Esseen (see e.g. [7]) combined with 
the procedure of Quine and Robinson [25] to establish the result of Theorem 2.1. 

As shown in Loeve [19] (page 285) or Feller [7], the left hand side of (1) is dominated by 


du 24a-i„)iV-^/" 

— +- ’ - 

u ri'Ky2'K 

where 77 > 0 will be specified later. From Lemma 4.1 and a Taylor expansion, 



'l[niu/aY(,n)Xn^^) _„ 2/2 

Jo 

27rP(5'„ = kn) 


(24) 


'0„(M/crY(n)fVy^) _^2 


27rP(5'„ = kn) 


/2 




e ' sup 


< sup 


dt 






{u/ayi^) nY) 


27rP(S'„ = kn) 


- 1 


27rP(5'„ = kn) 

f(n) 


t=9 


dt 




,U/2 ’ ,U/2 

^crx(n)Xn aY(n)Xn , 


ds 


t=e 


where c„ := 27rP(S'„ = kn)o'x(n.)Nn ^ £5 and ^ ^ has already been defined in the proof of 

Proposition 4.5. Now we split the integration domain of s into 

Ai := |s : Js] < e(Tx(")A^y^} and A 2 ■■= ^ \s\ , 

(where 0 < e < tt will be specified later) and decompose 


tfn{u/aY(n)Nn^^) _„2 


27rP(5'„ = kn) 


— e 


/2 


< sup [ii{u,0) + hiu.e)], 


where 




ll{u,0) =Cn^ [ 

Ja^ 

/2(u,0)=c-ie-“'/2 


m 


(P+s^)/2 Ar„ 
'rn 


t 


m 




,rl/2 ’ j.j-1/2 

^ax^Xn ay^Xn 
t 


t=0 


,.1/2 ’ Arl/2 

CTxMXn (Jy^Xn , 


t=9 


ds. 


(25) 

ds, (26) 
(27) 


To bound li{u,0), we use a result due to Quine and Robinson ([25, Lemma 2]). 
Lemma 4.6. [Lemma 2 in [25]] Define 

il.n — ^^(n) A^n ^ and ^2,n — Py^ y{n) Xn ^ ■ 
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If h,n ^ 1 o,rid ^ 2,71 ^ 1; then, for all 


{s,t) G R:= |(s,i) : |s| < \t\ < > 


have 


_5 

m L 


(s=+t=)/2 iV„ 


,rl/2 ’ ,,1/2 

ax(n)Nn aY(n)Nn , 


^ C'o(|s| + |t| + + h,n) exp 


(28) 


with 


Co := 98. 


Proof. We refer to the proof in the appendix of [25]. The condition li^n < 12“^/^ and k^n < 12“^/^ 
appearing in [25, Lemma 2] can be replaced by h^n ^ (33/32)^/^ and l2,n ^ (33/32)^/^ since the factor 8/27 
in (A4) of their proof can be replaced by a factor 1/27. Since we do not provide the best constants here, 
we simply suppose ^ 1 and k^n ^ 1- Finally, Cq has to be greater than 4 and 

27[H^2HKH!±^ -(.=+.=)/24 
(«,s)gR 2 (kl + I'Sj + 1)^ 


^54-(|n| + |s|)e-("'+"')/24 


^ 108 - 

V 12 e 


^ 98. 


By assumptions (H2.1.2) and (H2.1.1), 

/i < < 3 -1 at - 1/2 

1/2 3 1 1 

which implies that ax^Nn ^ C 2 c/ Similarly, 

l2,„ ^ c3/V-i/2 ^ clc3a-l,N-^/k 

and ay^Nn^^ ^ ^4 Assume henceforth that 


e := min ( -ciC^jTT ) and r] := min ( - 0304,770 ). 


Lemma 4.7. There exists a positive constant C\ such that 

sup Ii{u,0)duGi 


Jo o^esgu 

Proof. Conditions (31) imply that, on Ai, 


N, 


1/2- 


-1 


|s| < £Crjf(n) A^y^ ^ -I 4 „ 
and | 6 »| ^ |m| paY(n)N^/^ ^l^l, 


□ 


(29) 


(30) 


(31) 


(32) 
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which ensures that (s,u) G R as specified in Lemma 4.6. Moreover, since we have Nn ^ max(c 2 ,C 4 ) (cf. 
hypothesis in 2.1.b), ^ 1 and Z 2 ,n ^ 1- Now applying Lemma 4.6 in (26) and using part 2.1.a, we get 






sup Ii{u,9)du 


< c“^Co(/i.„ + y.n) / / (|s| + |m| + 

do JAi 

< + C4) f (|s| + |w| + +“ '^/'^'^dsdu 


and the result follows with 

Cl = c^^Co(c 2 + C 4 ) [ (|s| + \u\ + ^^‘^^dsdu. 

dR2 


□ 


Now, we study the integral on A 2 . 

Lemma 4.8. There exist positive constants C 2 and C^, only depending on ci, ci, C2, £3, C3, C4, C5, £5, and 
cg, such that 


f'V^Y (^) 


N1/2 


sup l 2 {u, 9 )du ^ C 2 e 


(33) 


Proof. We use the controls (21), (19), and |<p„| ^ 1 to get 

s t 


_5 

m 




= 


.iVrr-1 


s 9 \ 


Tn 


,A/2 ’ Arl/2 

ax(n)Jyn CTyM^n , 


9ipn 


+ 


,rl/2’ ,A/2 

Cry{n)lyn , 
Nn difn I S 


,A/2 I ,A/2 ’ ,A/2 

(Jy{n)Nn \crxMNn (7y(n)Nn , 


^ e®y2g-(.^+0-c5(JV„-i)/7V„(|^| ^2|0|). 

Finally by (27) and for 7V„ ^ 2, we conclude that 


^ 2 c: 


-1 


sup l 2 {u, 9 )du 

^+00 /*+oo 


sup 

ecr^(„) wy^ 


02 


(s + 20) exp I-T TT ( 1 ~ ^''5 


Nn-l 


Nn 




^ 2 £; 


-1 


^+00 /*+oo 


0 JEiy„(n)N, 


^ 2£r^—e 

^ C5 


x(")' 

2^2 


1/2 


(s+2t)e- 


iVnC5£2(7^(^)/2_ 




2A/min(l,C5) 


2 £ 


= -^"'=5e"4(„)/2 


-1 ^ _ 

^ min(l,C5) c5eaxMNn^^ ' 
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The conclusion follows with 


C 2 ■■= 2S 



/ 

y^min(l,C5) 

V 


min(l, C5) min | |ciC2, tt j Ci 


and 


C'a •= C5 min 


g Cl C2, TT 



(34) 


(35) 

□ 


To conclude to part b) of Theorem 2.1, just wright 


= ^^(C37V„)l/2g-C3Ar„ ^ 

Nr), Nr), 


since ^ is maximum in 1/2. So, 


sup 


't7„-iV„E [r(")] 




^ X I — $(x) 




c 


N, 


1/2 


with 


C:=Ci+ C2C'3"^/^(l/2)i/2e-^/2^ 


(36) 

□ 


Proof of Theorem 2.1 - Part c). We start proving (2). We adapt the proof given in [13]. Using (17) with 
E[y*^")] = 0, and differentiating under the integral sign of (18), we naturally have 


|E[C/n]l = 


-#n(0) 


27rP(S'„ = kn) 


€ 


27rP(5'n = fc„) A^y ^ 


difr, 


dt 




1/2 


,0 




N„-l 


^(TjSf(n) W 


1/2 


,0 


ds. 


(37) 


Using inequality (22) of Lemma 4.3 with r„ = 0 and t = 0, assumptions (H2.1.1), (H2.1.2), and (H2.1.6), 
we deduce 


dipn 


dt 


O'xM -^1 


1/2 


,0 


2 1/3 2/3 2 

^ s PymPxm ^ C2C3C4 2 

^ _ o _ _ S. 


2 


2W 


Then using inequality 19 of Lemma 4.2 with t = 0 and for Nn y 2, 
difn I s 




So, 2 holds with 


,1/2 


[ajc(n)JVy^ 


,0 




N„-l 


Arl/2 ■ 


,0^ ds if [ s^e. ^^’^''I'^ds. 

j Jr 


:= 

2 c 5 Jr 


(38) 
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To prove (3), since t„ = cryM and E \Un\ is bounded, it suffices to show that the quantity |E [[/^] — NnCTy^^i 
1 /2 

is bounded by some d^Nj . Proceeding as previously, 


E [Ul] = 


-C(o) 


27rP(S'„ = kn) 


= -C-^Nr, 


(Nn -1)J 


X 


(n)^ 


dipn 


^N„-2 


Cji Nn 


d'^ipn ( ^ ^ 




jvy= 




. 1/2 


,0/5; 


Nn-l 


.o'xmN, 


1/2 


y(TxM N, 
, 0 I ds. 


1/2 


, 0 ds 


^Crx(n)Nn 

First, by inequality (22) with r„ = 0 and t = 0, the control (19) with t = 0, and for iV„ ^ 3, one has 

, 1/2 / ^ 2 




— 7rcr„(-„-i iV, 


1/2 


difn 


< 


(n)J 

4 2 2 

C2C3C4 


dt 


.^ax(rt)N, 


1/2 


,0 


‘Pn’^ ^ 


^o'xmN, 


1/2 


,0 


dv 


I 


gi^-c,s /3^^ 


(39) 

(40) 


and finally using 2.1.a, the term (39) is bounded by 


// . C2C3C4 f 4 _C5S' 

Cg — 


4c5 


s e 


'/^ds. 


Second, we study the term (40). We want to show that 

d’^ipn I s 


Nn '■— Cn 


-1 




dt^ \ax(r.,N 


1/2 


0 /5 


.1V„-1 


.ax^N 


Ypii 0 I ds + cry(„) 


1 /o 

is bounded by some d^' jNj . Recall that, by Lemma 4.1 and assumption (H2.1.4), 

f s \ 

/ , /, " -172 ’ 0 du = 27rP(5„ = kn)axM = c„, 

\crx(n)Nn J 


so 


Nn — C„ 


-1 


^ax(n)N', 
' if 


^x(")^ 




^axMN 


YJ^i 0 I + tTy(„)/5, 


• P 


‘ \ )\ri/2’ 

\ax(~)Nn 

.IVn-l I ^ 


Arl/2 ’ 
yUxMNn 


0 ds 


= c 


-1 


^x(")^ 


a-x(")^ 


E 


.rl/2 




_ is<T-)^j/vyi/=(xW-E[xW]) 


-E 




• /2 


.Afn-l 




,0 I ds. 


Applying Taylor theorem to the function 


/(s) =-e vC") ^ ^ ^^+Ee v(") " ^ i ji 


(41) 
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yields 


|/(s)| ^ |s| sup 

nG[ 0 ,s] 


iua ^ 


;\rl /2 

O'x('i) 


+E 




ax(rt.)Nn^ 




N, 


Thus, using Holder Inequality, 


1/2 


X(") -E[X(")] 


axM 


+ E 


X(^) - E[X(’")] 


axM 


E[r(")V(s)] 




N, 


1/2' 


X(") - E[X(")] 




2 11/2/3 1/3 

^y(n) kl Py(n) P2C(") 


+ 1 


+ E 


X(^) - E[X('^)] 


ax(n) 


\<^Y(n) CTxM 

and, applying equation 2.1.a, assumptions (H2.1.1), (H2.1.2), (H2.1.5), (H2.1.6), and the majoration (19) 
with t = 0 , we get 

2/3 1/3 


with 

Finally, 

with 


O I/O \ « 

4(") / Jr 

Cg" := C3C^^(1 + C2C4) [ |s| 

|Var([/„) - NnTl\ < C7 + Cg + Cg'A^y^ ^ 


iV, 


1/2 


(42) 


Cg := C7 + c'g' + c'g" 


cic3C4 
2 £5 


+ [ s^e-^^^'/^ds + c,c^\l + C2cl) [ \s\e-^'^^/^ds. 

Jr 4c5 J-^ 


(43) 


Now we turn to the proof of (4). Let us show that the previous estimates of E[{7„] and Var(C/„) make it 
possible to apply (1). Remind that E = 0. Write 


€x} = 


where 


f Un - E[17„] 
\var(17„)^/' 

Var(C/„)i/2 

Arl /2 
■i’n 



Un 


dn — 


aA!'^ 

I\n 


and bn ■= 


^ UnX + h 


t ( 1 


nUn] 

Arl /2 

■i’n 


The previous estimates of E[[/„] and Var([/„) yield 

kn - 1 | < |a^ - l| ^ cgCg and bn ^ crc^^■ 


Now, 




€ 


Un 




< GnX + bn] - ^{anX + bn) 


+ \^{anX + bn) - ^'(x)l 


< + C2e + |$(a„a; + 6„) - $(x)| . 


N, 


1/2 
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For Nn > 4cg/c§, a„ ^ 1/2 and applying Taylor theorem to $ yields 

\^{anX + bn) - ^{x)\ < |(a„- l)a; + 5„|sup 

t y/ZTT 

^ max(c8C3 \c7C^^)(|a;| + i)e-(l^l/2-=7C3')V2^ 

the supremum being over t between x and anX + The last function in x being bounded, we get (4) with 

Cl := max(c 8 C 3 C 7 cj/^) sup (|a;| + ^ . 

□ 


4.3 Proof of Theorem 2.6 

We start with the proof of Theorem 2.6, which relies on three different lemmas. 

Proof of Theorem 2.6. Let z„ such that lim inf > 0. Since — E also satisfies the hypotheses, 

n->-oo Nn 

we can assume that E =0. Define 

Pn„ = '^{Tn > Zn) 

and for any m € |0, Nn}, 

PN^,m = P ^ Zn, Vi G |1, Nn - m} < z„, Vi G |iV„ - m + 1, iV„| f/”^ ^ 
with the usual convention |1,0| =0 and fNn + 1, iV„] = 0. Now write 




Pn^ = PNn,0 + NnPN„,l + ^2 


Nr. 


P 


Nn,m- 


Using Lemmas 4.9, 4.10 and 4.11 that follow, we conclude the proof of Theorem 2.6. 
Lemma 4.9. 

1 


lim sup —= log(PAr„,o) ^ -a. 

n—¥oo v 


Lemma 4.10. 


Lemma 4.11. 


-(3 ^ lim inf log(iV„PAr„,i) ^ lim sup log(Af„PAr„,i) ^ -a. 

V n—¥oo v -2^n 


Nn 

E 

m=2 


Nr, 


^Nn ,m — O 


(e-“V^) . 


Proof of Theorem 2.6. Lemmas 4.9, 4.10, and 4.11 yield, for all a' < a, 

-(3 < lim inf log(A^„Pv„,i) ^ lim inf log(Pv„) 
n-i-oo JZn "->0° JZn 




lim sup —E log(PAr„) ^ lim —E log fse “ 

rj,—A / Zr\ ^ A /Zn ^ ' 


(44) 

□ 


Conclude by letting a' —>• a. 


□ 
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Proof of Lemma 4-11- Let a' € ]a/2,a[. Using (9) and noting that Zn ^ for n large enough, we have, 
for all n large enough. 


Nn 


m —2 


Nr. 


N„ 


^ E ^ ^nr ^ j 


Nr 


2-2a\ 


m —2 


- Nr,e-^\ 




□ 


Proof of Lemma 4-10. First, using (9), 

limsup —E log(iVjiP/v„_i) ^ limsup —E logP(F^"^ > Zn) ^ —a. 


n—¥oo v 


n—¥oo \/ 


Let us prove the converse inequality. Let e > 0. We have 




Pn„,i = p( r„ > 


^ z„, Vi e [1, iv„ - II y/”) < 


'0 


Tfi—l ^ 


Vi G [1, - II y/’^) < Zn) P(y("^ G du) 

(Pn-l >Zn-U, Vi G [1, Nrr - ll < Zn) P(y(”^ G du) 

> p (r„_i >-iv„e, viG[i,iv„-ii y/"^ <z„)p(y(") + 


Zrr+NnC 


Observe that 


P(T „_1 ^ -Nne, V^ G p, Nn - 11 y/”^ < Zn) 


(y(”) <ZnY" ' - P (Tn -1 < -Nne) ^ 1 . 




Indeed, P (y/”^ < Zn) 


Afn-l 


—>■ 1, using (9); and, by Chebyshev inequality and assumption (H2.6.2), 


P {Tn-l < -NnS) ^ 


NnS'^ 


0 , 


the random variables y^"^ being assumed centered. Finally, using (8) and (H2.6.1), and noting 5 = 


lim inf , one gets 

n^oo iVr, 


liminf —= log(iV„PAr„,i) > liminf ., - 

n^oo rJZn V Zn V Zn + NnS 


Zn + NnS 


l0gP(y(") ^ Zn + Nne) 


^-/3 


5 + e 


Conclude by letting e —0. 


□ 


Proof of Lemma 4-9. Let a' G ]0, a[ and s„ = a'jr/zf. The exponential Chebyshev inequality for T„ 

r{n) 


conditioned on {Vi G |l,iVn] , ’ < Zn} yields 

PNnfi < 

If we prove that 


c n 1 


N„ 


E 


e iy(n)^2:„ 


= 1 + 0 


N 


1/2 1 ’ 
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then 


log(PAr„,o) ^ -a'y/z^ + 

and the conclusion follows by letting a' —)► a. Let t] S]3/4, 1[. Write 
= edu)+ / ” ^ 

J —OO j yjz-n 

='■ Ii + I2 + h- 

By a Taylor expansion of f{t) = e*, (H2.6.2) and (H2.6.1), there exists 

9{u) < SnU < Sny/^ = a' 

such that 


G du) + [ " e""“P(y(") G du) 


h ^ 


€ 


/.:( 

/: 


l + s„u+^e^(“) 


1 H" Sn'^ 


2 2 


2 ^'^P(y(") 


€ du) 


G du) — 1 -|- 0 H- ■ 


^y(n) 


2z 


e“ =1 + 0 


iV, 


1/2 


Let no such that, for all n > no and u ^ logP(F^"’^ ^ u) ^ —a'^fu. Suppose n is larger than no- 

Integrating by part, we get 


/2 = - 


e®"“P(y(”) ^ n) 




^ e*"^P(y(") ^ + s 

/ -^n —(^Tt) 
/Zt,. 


V Sr, 

Zrr-{^„r 


z„-(z„r 


exp I a 


e""“P(r(’") ^ u)du 
^d^ 

— y/u ) ) du. 


Since, for all t G [0,1], y/1 — t < 1 — t/2, we get, for all u G Zn — (zn)^] and n large enough to have 

< 1 , 


(^y/1- (Zn)^-^ - l) ^ - 


{Zn) 


ri-3/i 


Hence, 


I 2 = o 


N, 


1/2 y ■ 


Let a" G ]a', a A 2a'[. Let ni such that, for all n > ni and u > Zn — zl[, logPyF^") > it) ^ Suppose 

n is larger than ni. Integrating by part, we get 


h = - 


e®"“P(F(") ^ u )]""" + s„ / " e®"“P(F(") > u)du 

iz„-zZ Jz„-zV. 


^ e 


Sn(Zn—z' 


¥(y(") > - z:i) + Sr. 


'^du. 


Now, since y/t ^ t ii t € [0,1], 

eS„(2„-22)p(y(n) ^ ^ (yi;(a' (I - - «" (1 - 

^ exp {y/z;;{a' - q;")(1 - z^-^)) = o [-^) ■ 

\ ly-n / 
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Finally, applying Taylor theorem to the function f{u) = SnU — a''^/u around the point Zn yields 


f(u) = - a''^/u = (a' - 

Vv^ 2Vc/ 


(u — . 


with c € [u, Zn]- Since a" < 2a', we have 


a' a" \ , / a' 

z iu — . 


/Zn 2yJ~C J 

for n large enough and we conclude that 


U- Zn) 


2 ^Zn - Zn 


[U - Zn) < 0, 


h = o 


N, 


1/2 y ■ 


□ 


4.4 Proof of Theorem 2.4 


Now we turn to the proof of Theorem 2.4. So as to apply Theorem 2.6, we need the next result, which is 
analogous to equation (2). 

Proposition 4.12. Under assumptions (H2.4.1), (H2.4.3) and (H2.4.5), one has 


E [Tn\ Sn = kn] = /V„E 

Proof. Using inequality (37) and Proposition 4.5 yield 


y(n) 


^Nn). 


E 

Tn - A’nE 

y(ra) 



-#n(0) 



J 



27rP(S'„ = kn) 


€ 


Nn 

2'Km 




T^(n)lVn 


1/2 


difir 


dt 




1/2 ’ ® 




N„-l 


m1/2 

, 0'jf{n) iV„ 


,0 


ds. 


(45) 


It remains to show that the integral converges to 0. Putting together (45) and (22), and using hypothesis 
(H2.4.5) and the control (20), one gets 


E 


Tn - /V„E 


y(ra) 


Sn — kn 


= 0{Nn). 


□ 


Proof of Theorem 2-4- Let y > 0- Since (N^"), — E ) also satisfies the hypotheses, we can assume 

that E = 0. According to Proposition 4.12, 

yu-=y+ -^E [Un] -/ y. 

We have 


P(t7„ - E [Un] > NnV) = F{Tn - E [Tn]Sn = kn] ^ iV„y|5„ = kn) 

_ F{Tn ^ NnVn, Sn = kn) ^ F{Tn > A^nJ/n) 
F{Sn = kn) " F{Sn = kn) 

The conclusion follows using Theorem 2.6, Proposition 4.5 and (H2.4.1). 
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Using decomposition (44), we get 


P(C/„ - E [[/„] ^ N^y) = P(r„ - E = kn] > Nny\Sn = 

- ^ Nnyn, Sn = kn) , > N V S - k ) 

^ iv„p(r„ > iv„y„, y„(") iVn^n, Vi e [i, 7V„ - ii < iV„ 2 /„, = fc„). 


Define 


QAr„,i := P (t„ > iV„y„, > iV„y„, V^ G p, iV„ - I] = fc„) . 

It remains to show that 


liminf -j== log(iV„(3Ar„,i) > -/?, 

n-s-oo ^JNny 


which is analogous to the lower bound of Lemma 4.10. We have, for any e > 0, 

= P (t„ > yj") > Nnyn, Vi G [1, - I] y/"^ < N^yn. Sn = kn) 

/*+oo 

= / P ( T „_1 ^ - n, Vi G [1, Nn - I] Sn = kn) P(y(’^) G du) 

JNr^yn. ^ ^ 

p-\-oo 

^ P (t„_i ^ Nnyn -u,yi€ [1, Nn - I] y/"^ < iV„y„, Sn = kn) G du) 

■/Nr^iVn+e) ^ ^ 

> P (t„_i ^ -iV„e, Vz G [1,7V„ - 11 y/") < iV„ 2 /„, Sn = kn) P(y^”) ^ Nn{yn + e)). 
Observe that 

P(r„_l > -Nne, Vi G [1, Nn - 11 y/"^ < Nnyn, Sn = kn) 

> P (y(") < - (1 - P(5„ = kn)) - p (r„-i < -Nne). 

For a' G ]0,q:[ and n large enough, using (7), one has 

P (y(") < iV„y„) ^ (1 - e-“'=1 + 0 • 

By Chebyshev Inequality and hypothesis (H2.4.5), one has straightforwardly 

1 


P(yn-1 < —NnS) ^ 


'YM _ 

NnS'^ ^ V N^^ 


Hence, using Proposition 4.5 and hypotheses (H2.4.1) and 6, 


1 1 / 777- 

liminf -7== \og{NnQn,i) ^ liminf -7== log -^ 

TJ^OO ^Nny y/Nny \ax(^)Nn' 


I + lim inf —;= log P(y 


^ Nn{yn + e)) ^ -/3 


P + e 


Conclude by letting e —?► 0. 


□ 
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